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s Yogi Berra said, “Predictions are hard, especially 
about the future.” Nevertheless, as a computer guy, Pd like to 
offer a few forward-looking observations about the emerging 
impact of information technology on scientific research. And 
Pd like to ask a couple of questions. Are some of the current 
uses of information technology in scientific research redefining 
traditional scientific research? Has the computer revolution pro- 
duced a “new renaissance,” one that has resulted in the creation of 
additional, newparadigms of Science? 
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Scientific Research: How Many Paradigms? 


Traditional Scientific Research 

Scientific research refers to a particular 
method for acquiring knowledge about 
natural phenomena. This method has 
two dimensions: one of observation and 
experimentation and one of description and 
explanation. Sometimes, observation 
precedes explanation, and sometimes a 
proposed explanation precedes experi- 
mental confirmation. A scientific expla¬ 
nation is often made by creating a model 
of (some definable part of) reality. As the 
statistician George Box observed: “Es- 
sentially, all models are wrong, but some 
are useful.” 1 That is, all models are only 
approximations and are subject to being 
superseded by more useful ones. 

The observation and experimenta¬ 
tion dimension of scientific research is 
called, appropriately enough, experimen- 
tal Science; the creation of models to de- 
scribe and explain natural phenomena 
is called theoretical Science. These two di- 
mensions—experiment and theory-are 
sometimes called the two paradigms of 
Science. We might say that observation/ 
experiment is the ground of Science and 
explanation/theory is the superstruc- 
ture. A theoretical model is especially 
valued if it not only explains previously 
observed phenomena but also predicts 
new phenomena that are subsequently 
observed. Two examples from the his- 
tory of Science demonstrate the interplay 
of observation and explanation. 

Traditional Astronomy 

Astronomy provides one example of 
traditional scientific research. Perhaps as 
long ago as 3000 BCE, the Babylonians 
had observed wanderers (i.e., planets) in 
the sky and followed the wanderings 
of the planets among the fixed stars. 
Around 100 CE, Ptolemy created a 
model of the universe that, among other 
things, explained these wanderings. His 
model had a fixed earth at the center of 
the universe, with the sun, the moon, 
and other planets orbiting it in circles. 
Some of those orbits had to be circles 
within circles to explain the “backing 
up” that some planets occasionally did. 

In the 1500s, Nicolaus Copernicus 
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suggested a different model that put the 
sun at the center of the universe and 
placed the earth as the third planet orbit¬ 
ing it (i.e., two planets orbited closer to 
the sun, and three were farther away). 
One of the reasons this new model was 
not immediately accepted was that ob¬ 
servation proved it to be as inaccurate as 
the old one. But when Johannes Kepler 
modified the model in the early lóOOs 
by changing the orbits from circles to 
ellipses, it became much more accurate. 
And it was simpler: no more orbits of 
circles within circles. Galileo added 
experimental evidence in support of 
Kepler’s model using the new “informa- 
tion technology” called the telescope. As 
a culmination of this phase of astronomy, 
Sir Isaac Newton’s mathematical laws of 
motion and gravitation explained why 
the planets followed elliptical orbits, 


while the laws also provided equations 
that could predict the planets’ motions. 

Traditional Biology 

A second example of traditional scien¬ 
tific research is provided by the biologi- 
cal Sciences. Around 300 BCE, Aristotle 
began the work of classifying different 
life forms based on their similarities and 
differences. The insights of Charles 
Darwin and Alfred Russel Wallace in the 
I9th century explained those similarities 
and differences as resulting from evolu- 
tion from previous life forms, where 
more similar organisms had more recent 
common ancestors. A few years later, 
Gregor Johann Mendel’s experiments 
with peas suggested a mechanism for 
inheritance: genes, which were carri¬ 
ers of inheritable traits. In 1953, Francis 
Crick and James Watson identified the 
doublé helix structure of the Chemical 
DNA in the nucleus of cells and estab- 
lished that it contained the genes of the 
organism. This discovery was followed 
by another that explained how cells use 
genes to create proteins, from which all 
parts of living organisms are formed. 
Also, genetic imperfections during re¬ 
pro duction and at other times can be 
seen to foster changes that (occasionally) 
produce organisms with better chances 
of survival-explaining, at least in part, 
how evolution works. This 20th-century 
understanding of how life works was at 
least as revolutionary as, and probably 
much more so than, the I7th-century 
understanding of how the solar system 
works. 

Uses of Information Technology 
in New Scientific Research 

These days, information technology means 
electronic information technology. Even 
the single word technology seems to mean 
electronic information technology (to 
the regret of the engineering profes- 
sion). Although this article also focuses 
on electronic information technology, I 
acknowledge the long, important train of 
IT developments. For example, develop- 
ments in optical information technol- 
ogy-such as telescopes, microscopes, 
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and cameras-contributed greatly to the 
advancement of Science. In addition, 
mathematical information technol- 
ogy-such as mechanical calculators and 
tables of logarithms-helped to relieve 
the drudgery of performing numerical 
calculations, which were increasingly 
important in Science after Newton. The 
mechanical calculator, in particular, 
foreshadowed the dramatic advance to 
electronic computation. 

Computational Science 

Newton’s laws galvanized the use of 
mathematical models for scientific 
theory in the “hard Sciences,” which 
became a synonym for a Science based 
on mathematics. It would be difficult to 
overestimate the importance of New- 
tonian Science on the Western mind. 
Before Newton, witches were abroad in 
the land, casting spells and killing cattle; 
after Newton, we lived in a mechanical 
universe where such things were impos- 
sible. But technical problems remained: 
having a mathematical equation and 
solving it were often two different mat- 
ters. A famous example of this difficulty 
arose when it was discovered that the 
three-body gravitation problem could 
not be solved. For example, the equa- 
tions for the motions of the sun, the 
earth, and the moon have no closed- 
form solution, even if no other gravita- 
tional f orces are considered. 

Consequently, mathematical models 
were not always practical by themselves. 
This problem began to be solved by 
the application of electronic informa¬ 
tion technology to scientific research: 
namely, the computational simulation of 
mathematical models. For example, the 
earth-moon-sun gravitation problem 
could be simulated by starting with 
initial positions and velocities for those 
three bodies and computing new posi¬ 
tions and velocities “one time step later” 
by computational use of Newton’s laws, 
then repeating this process for some 
(usually large) number of time steps. Of 
course, this process is not exact. Com¬ 
puters can carry approximations only to 
real numbers, and accumulated round- 


off error can destroy the accuracy of a 
computation. Also, the time step has to 
be large enough to let a number of steps 
be taken but small enough to control 
the round-off error resulting from using 
finite time steps (as opposed to the con- 
tinuous flow of time). Since computa¬ 
tional scientists always want to improve 
the accuracy of their results and tackle 
bigger problems, they have an insatiable 
appetite for bigger computers that can 
perform massive numbers of these cal¬ 
culations in a reasonable time. 

A major step in the use of compu¬ 
tational Science was taken when it was 
realized that nonmathematical models 
could be simulated as well as mathemati¬ 
cal ones. That is, rules other than mathe¬ 
matical equations could be incorporated 
into computer programs and stepped 
through simulated time “to see what 
happens” to the state of the model. For 
example, traffic simulations can predict 
where bottlenecks and other problems 
may occur. Even simple rules can gen- 
erate complex behavior, making it very 
valuable to observe the output of the 
simulation to understand what the rules 
imply. On the other hand, given a set of 
data, either observed or human-entered, 
can computers help us find rules to ex- 
plain the data? 

Data-Intensive Science 

In a recent international workshop on 
cyberinfrastructure for Science, the 
following statement was made: “Many 
fields [of Science] have depended on 
computational Science simulations, and 
many now are beginning to depend on 
computationally intensive data analy- 
sis.” 2 This statement succinctly expresses 
the relatively recent addition of big data 
information technology to big computing in¬ 
formation technology in scientific research. 
Supercomputers, the fastest computers 
available at a given time, have always 
produced big data output from simula¬ 
tion runs, but now big data information 
technology also occurs in many other 
venues, such as sensor output, experi¬ 
ment output, and databases. 

The reason big data has recently 
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become important is that disk storage 
is now cheap enough that scientists can 
afford to store massive amounts of data. 
When disk storage was intro duced in the 
I950s, the cost was about one dollar per 
byte. But keeping the cost the same, disk 
capacities have doubled every year and 
a half, even f aster than Moore’s law for 
the number of transistors on a chip. We 
passed the gigabyte-per-dollar thresh- 
old in the last decade. Later this decade, 
a terabyte of disk storage may cost one 
dollar! 

Big data storage management is a 
research topic in and of itself: we can 
currently store more data than we know 


30 EDucAusErevi ew may/june 2012 




how to process. Scientists are learning 
how to look for patterns and trends in 
massive databases, but other challenges 
remain. The February 2012 issue of Com¬ 
puter focused on an aspect of big data 
research related to what is called the CAP 
theorem. 3 CAP stands for consistency, 
availability, and partition tolerance, and 
the theorem States that only two of those 
three attributes can be maintained in a 
massive dataset. Traditional relational 
database systems could maintain all 
three attributes, so various methods for 
dealing with the two-out-of-three trade- 
off are being studied, such as the one 
Google uses to enable searching over 
the Web. As Google software crawls the 
Web to prepare the keyword indices that 
enable speedy search results, multiple 
copies of the same page may be included 
when the page has been updated over 
time. To search for such inconsistencies 
while creating the database would be 
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impossibly time-consuming, so the in¬ 
consistencies are tolerated and are then 
dealt with at a later stage of the process. 

Big data has other ramifications 
beyond data-intensive Science. The Oc- 
tober 2011 McKinsey Quarterly article en- 
titled 'Are You Ready for the Era of 'Big 
Data?” asks five questions for businesses 
to consider: 

1. What happens in a world of radi- 
cal transparency, with data widely 
available? 

2. If you could test all of your decisions, 
how would that change the way you 
compete? 

3. How would your business change 
if you used big data for widespread, 
real-time customization? 

4. How can big data augment or even 
replace management? 

5. Could you create a new business 
model based on data? 4 


The New Astronomy 

Astronomy provides an example of a Sci¬ 
ence that may be revolutionized by big 
data, because astronomers are building 
a massive database of all the sky images 
produced by many telescopes. More- 
over, these images will be kept in time 
sequence, meaning that computer pro¬ 
cessing can look for changes taking place 
and not just individual moments. 

TheNewBiology 

Biological Sciences offer another ex¬ 
ample of how information technology 
is revolutionizing a Science. The ap- 
plication of information technology to 
biology has even been given a name: 
bioinformatics. In fact, bioinf ormatics may 
be more than just the application of in¬ 
formation technology to biology; it may 
involve the co-creation of new biology 
and new information technology. Biol¬ 
ogy is now recognized as an information 
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Science, founded on the genome infor- 
mation bases common to all f orms of life. 

Over the last decade, following the 
completion of the human genome proj¬ 
ect, exponential improvements in the 
process and price of computerizing a 
genome have occurred, in part by means 
of advanced IT techniques. The first ge¬ 
nome cost millions, if not billions, of dol¬ 
lars. Now the goal of a thousand-dollar 
genome is in sight and may be reached 
in a few years. Medical Science looks to 
the day when every person will have a 
computerized genome, which will be 
used to personalize medical treatment 
in ways that are just coming into view Of 
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course, without massive, cheap data stor- 
age and the ability to process it, none of 
this would be possible. 

The New Social Science 

Sociologists and other social scientists 
conduct experiments in which human 
behavior is studied. They also admin- 
ister surveys that seek to study human 
behavior via the responses made by 
the subjects. Traditionally, information 
technology has been utilized to imple- 
ment statistical analyses of such studies. 
Recently, William Sims Bainbridge has 
argued that a radically new mode of 
social Science research will be enabled 
by information technology-namely, the 
use of simulations as an experimental 
environment. Electronic games such as 
World of Warcraft and other electronic 
environments such as Second Life offer 
social scientists a broad vista in which to 
study human behavior, a vista that ex- 
tends well beyond what could be accom- 
plished ethically in the physical world. 5 
(In this case, the simulations themselves 
would not be the object of study, as in the 
previously discussed uses of computa- 
tional simulations, except as pertains to 
the human behavior in those simulated 
environments.) 

Computer Processing of Scientific Literature 
The designation “big data” refers to 
more than just big size. It can also refer 
to big complexity. In fact, any data that 
we don’t yet understand well enough to 
computerize can be called big data. One 
such area of big data is textual informa¬ 
tion. Natural language processing (NLP) 
has frustrated computer scientists for 
decades but is now starting to yield to 
computer processing (and translation). 
When IBM’s Watson computer defeated 
Jeopardy champions in head-to-head 
competition, we saw an example of sig¬ 
nificant progress in NLP. 

An experimental system developed 
at the National Library of Medicine 
(NLM) provides an example of a scien¬ 
tific research project involving NLP. This 
project has used the MEDLINE database 
of titles and abstracts of biomedical 


research articles to create a knowledge 
base called Semantic MEDLINE. NLM 
scientists have processed the MED¬ 
LINE database with three interlocking 
IT systems. First, they have employed 
a huge thesaurus of biomedical terms, 
also developed at NLM, called UMLS 
(Unified Medical Language System), 
which enables them to choose one term 
from each set of synonyms (a controlled 
vocabulary). Next, they have developed 
an NLP system to find the key sentences 
in each abstract and put them into a 
Standard form. These key sentences 
describe the claims of the article in the 
form “subject-predicate-object.” Each 
key sentence also points back to the 
original title-abstract record for later use. 
Finally, all the key sentences (currently 
60 million extracted from 20 million 
MEDLINE abstracts) are used to create 
a knowledge base, which is a directed 
graph-that is, nodes interconnected 
with arrows. (The Web is also a directed 
graph, one whose nodes are the web 
pages and whose arrows are the embed- 
ded web links that point to other pages.) 
In this case, the nodes are the nouns (the 
subjects and the objects) from the key 
sentences, and the arrows are links la- 
beled by the verbs (the predicates) from 
the key sentences. 

This knowledge base is in the form of 
the new IT data system called the Seman¬ 
tic Web. It can be queried and browsed in 
ways that are analogous to (but consider- 
ably more powerful than) relational data¬ 
base systems. New scientific results have 
been derived from it by automatically 
combining key sentences from multiple 
abstracts. Experts conjecture that such 
systems will be important new sources of 
scientific results in the future, especially 
for interdisciplinary studies. 

Is Information Technology 
Redefining Scientific Research? 

In The New Renaissance, Douglas S. Rob- 
ertson asserted that computing is creat- 
ing a renaissance as great as the three 
that preceded it-namely, those created 
by spoken language (500,000-50,000 
years ago), written language (5,000 years 
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ago), and printing (500 years ago). 6 If he’s 
right, a revolution in scientific research 
will be simply a part of the legacy of elec- 
tronic inf ormation technology 

Indeed, the use of information tech¬ 
nology is propelling scientific research 
forward in many ways, including ways not 
discussed here (e.g., Citizen Science and 
enhanced collaboration). Previous inven- 
tions such as the telescope and the micro¬ 
scope likewise propelled astronomy and 
biology forward, but these tools merely 
enhanced the observation/experiment 
paradigm. 7 And new mathematics such 
as calculus in the I7th century and game 
theory in the 20th century merely en¬ 
hanced the explana- 
tion/theory paradigm. 

On the other hand, 
computational model- 
ing and simulation has 
been called “the third 
paradigm” of Science 
by observers who think 
it adds something be- 
yond the paradigms of 
experiment and theory 
Global weather simula- 
tions are so faithful to 
reality that hurricanes 
are spawned (by the 
simulation itself) when 
the conditions are right. And simula- 
tions of the interior of the earth sug- 
gested that the inner core was rotating 
more slowly than the outer core, even 
though there was neither data nor theory 
to suggest that this was happening (later 
analysis of earthquake seismic records 
confirmed that the simulation was cor¬ 
rect). Computational simulation would 
seem to be theoretical Science in that it 
involves the construction and manipu- 
lation of a model of a part of reality. But 
what manipulation! Churning through 
centuries of climate simulation in hours, 
for example, is certainly theory on 
steroids. 

More recently, big data has been 
called “the fourth paradigm” of Science. 
Big data can be observed, at least by 
computers processing it and of ten by 
humans reviewing visualizations created 


from it. In the past, humans had to reduce 
the data, often with statistical processing, 
to be able to make sense of it. Perhaps 
new big data processing techniques will 
help us make sense of it without tradi¬ 
tional reduction. One of the goals of big 
data discussed in the book The Fourth 
Paradigm is to make the scientific record a 
first-class scientific object. 8 As discussed 
above, Semantic MEDLINE is one step 
in that direction for textual information. 
Perhaps we will see techniques for doing 
likewise for tabular information. 

Conservative observers might say that 
computation and big data make only 
quantitative changes in what we can do, 
not qualitative ones. Lib- 
eral observers might re- 
spond that large-enough 
quantitative changes 
can produce qualitative 
change. The printing 
press, for example, pro- 
duced only a quantita¬ 
tive change in the time 
and cost required to 
make many copies of 
a document. Yet the 
qualitative changes that 
this quantitative change 
engendered in Western 
civilization were huge, 
probably enabling Science itself. 

So, maybe there are only two para¬ 
digms of Science. But maybe there are 
four. Or then again, maybe there are 
three! In the last chapter of his book 
Phase Change, Douglas S. Robertson 
shifts from considering phase changes in 
scientific disciplines to considering the 
scientific method. He suggests the three 
paradigms (though he doesrit call them 
that) of collecting, compressing, and 
organizing information. Collecting infor¬ 
mation is clearly another term for obser- 
vation and experimentation. Compressing 
information encompasses both traditional 
theory (e.g., mathematical modeling) 
and computational simulation (the com- 
pressed information is the simulation 
program). Organizing inf ormation refers to 
big data that carit be compressed. 

These are especially exciting times for 


Science, and that excitement is due in no 
small measure to the effects that infor¬ 
mation technology is having on scientific 
disciplines and on the scientific method 
itself. In keeping with Yogi’s dictum, it 
may be hard to predict what will be the 
most useful way of re-characterizing the 
scientific method in the future, when the 
dust kicked up by information technol¬ 
ogy settles. ■ 
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